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Abstract 

We consider the problem of delivering content cached in a wireless network of n nodes randomly located on 
a square of area n. In the most general form, this can be analyzed by considering the 2" x n -dimensional caching 
capacity region of the wireless network. We provide an inner bound on this caching capacity region, and, in the 
high path-loss regime, a matching (in the scaling sense) outer bound. For large path-loss exponent, this provides 
an information-theoretic scaling characterization of the entire caching capacity region. Moreover, the proposed 
communication scheme achieving the inner bound shows that the problem of cache selection and channel coding 
can be solved separately without loss of order-optimality. 

I. Introduction 

With the continued large-scale deployment of infrastructure, wireless networking continues to be an 
area of active research. In this context, unicast and multicast traffic has been widely studied. The influence 
of caches on the network performance, on the other hand, has received considerably less attention. 
Nevertheless, the ability to replicate data at several places in the network is likely to significantly increase 
supportable data rates. In this paper, we consider the problem of characterizing achievable rates with 
caching in large wireless networks. 

In its most general form, this problem can be formulated as follows. Consider a wireless network with n 
nodes, and assume a node iv in the network requests a message available at the set of caches U (a subset of 
the n nodes) at a certain rate A^^. The collection of all {Xu'\i;}u,w can be represented as a caching traffic 
matrix A'~'^ G M^"^". The question is then to characterize the set of achievable caching traffic matrices 
A^^(n) C M^"'^". We answer this question by providing an approximate (i.e., scaling) characterization 
of this caching capacity region A^'°'(n) for large wireless networks (i.e., as n ^ oo) under random node 
placement and assuming large path-loss exponent. Our treatment is information-theoretic, i.e., we do not 
make any assumptions on the communication protocol used. 

While AC^(n) is a high-dimensional object (namely 2" x n-dimensional), we show that feasibility of 
a traffic matrix A^^ can be efficiently evaluated. We also provide an explicit communication scheme 
achieving (in the scaling sense) the entire caching capacity region A'-'^(ra). 

A. Related Work 

Several aspects of caching in wireless networks have been investigated in prior work. In the computer 
science literature, the wireless network is usually modeled as a graph induced by the geometry of the 
node placement. This is tantamount to making a protocol model assumption (as proposed in [1]) about the 
communication scheme used. The quantity of interest involves the distance from each node to the closest 
cache that holds the requested message. The problem of optimal cache location for multicasting from a 
single source has been investigated in [2], [3]. Optimal caching densities under uniform random demand 
have been considered in [4], [5]. Several cache replacement strategies are proposed, for example, in [6]. 

To the best of our knowledge, caching has not been directly considered in the information theory 
literature. However, it can be seen that the problem of optimally transmitting messages held at several 
caches to a destination is a special case of communicating correlated sources over a noisy network. Indeed, 
we can consider that each cache has an identical message to send to the same destination. This more 
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general problem of transmitting correlated sources has received considerable attention. Unlike the situation 
with point-to-point communication, for network communication problems source-channel separation does 
not hold in general [7]. Hence, the problem of source and channel coding have to be considered jointly. 
While for some special cases optimal communication strategies for transmitting correlated sources over a 
noisy network are known (for example, a single destination node requesting all the sources observed in 
the network with independent network links [8], [9]), the general problem is unsolved. 

Finally, a special case of the caching problem considered here, in which each destination has only a 
single cache (i.e., standard unicast traffic), has been widely studied and is by now well understood. See, 
for example, [1], [10]-[20]. 

B. Our Contribution 

We consider the general caching problem from an information-theoretic point of view. Compared to 
the prior work mentioned in the last section, there are several key differences. First, we do not make 
a protocol channel model assumption, and instead allow the use of arbitrary communication protocols 
over the wireless network. Second, we allow for general traffic demands, i.e., arbitrary number of caches, 
and arbitrary demands from each destination. Third, we do not impose that each destination requests the 
desired message from only the closest cache, nor do we impose that the entire message has to be requested 
from the same cache. Rather we allow parts of the same message to be requested from distinct caches. 

We present an achievable communication scheme for the caching problem, yielding an inner bound 
on the caching capacity region. For large values of path-loss exponent, we provide a matching (in the 
scaling sense) outer bound, proving the optimality (again in the scaling sense) of our proposed scheme. 
Together, this provides a scaling description of the entire caching capacity region of the wireless network 
in the large path-loss regime. The proposed communication scheme solves the problem of optimal cache 
selection and channel coding separately, showing that such a separation is order-optimal. 

C. Organization 

The remainder of the paper is organized as follows. In Section [III we introduce the channel model as 
well as notation. In Section Unl we present the main results of the paper. Section |IV] contains proofs, and 
Section |V] concluding remarks. 

II. Network Model and Notation 

Consider the square 

A(r^)^[0,^/^2 

of area n, and let V{n) C A{n) be a set of |V"(?t,)| = n nodes on A{n). We assume the following channel 
model. The (sampled) received signal at node v and time t is 

yv[t]= 5Z M W + W 

u£V{n)\{v} 

for all V E V{n),t E N, and where {a;u[t]}„ are the (sampled) signals sent by the nodes in V{n) at time t. 
Here {zuft]},;,* are independent and identically distributed (i.i.d.) circularly symmetric complex Gaussian 
random variables with mean and variance 1, and 

for path-loss exponent a > 2, and where r„ t, is the Euclidean distance between u and v. The phase 
terms {6'„,„[t]}u,„ is assumed to be i.i.d. with uniform distribution on [0,27r)Q We either assume that 

'it is worth pointing out that recent results [20] suggest that, under certain assumptions on the location of scattering elements, for a € (2, 3) 
and very large values of n, the channel model used here (in particular, the i.i.d. assumption of the phase terms, might yield results that are 
too optimistic. However, in [21] the same authors show that, under different assumptions on the scatterers, the channel model used here is 
still valid also for a G (2, 3) and very large values of n. This indicates that the issue of proper channel modeling in the low path-loss regime 
for very large networks is somewhat delicate and requires further investigation. 
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Fig. 1. Subsquare {A^.^} with < I < 2, i.e., with L{n) = 2. The subsquare at level ^ = is the area A{n) itself. The subsquares at 
level £ — 1 are indicated by dashed lines, the subsquares at level £ = 2 by dotted lines. Assume for the sake of example that the subsquares 
are numbered from left to right and then from bottom to top (the precise order of numbering is immaterial). Then Vo,i{n) are all the nodes 
V{n), Vi,i(n) are the nine nodes in the lower left corner (delineated by dashed lines), and V2,i(n) are the three nodes in the lower left 
comer (delineated by dotted lines). 



is stationary and ergodic as a function of t which is called fast fading in the following, or we 
assume {^u,„[t]}t is constant as a function of t, which is called slow fading in the following. In either 
case, we assume full channel state information (CSI) is available at all nodes, i.e., each node knows all 
{hu,v[t]}u,v at time We also impose an average power constraint of 1 on the signal {a:„[t]}t for every 
node u G V{n). 

Partition A{n) into 4^ subsquares {A£^i{n)}f^^ of sidelength 2~^^/n, and let Ve^i{n) be the nodes in 
A£^i{n). The integer parameter £ varies between and L{n) defined slso 

L(n)^llog(n)(l-log-i/2(n)). 

The partitions at various levels i form a dyadic decomposition of A{n) as illustrated in Figure [B 

A caching traffic matrix is an element A^'^ G M^'^". Consider U (ZV{n) and w G V{n). Assume a 
message that is requested at destination node w is available at all of the caches U . denotes then the 
rate at which node w wants to obtain the message from the caches U . Note that we do not impose that 
any particular cache u E U provides w with the desired message, rather multiple of the nodes in U could 
provide parts of the message. Note also that A(/,u) and A^ ^ could both be strictly positive for U ^ U, 
i.e., the same destination could request more than one message from different collection of caches. We 
assume that messages for different (f/, w) pairs are independent. The caching capacity region A^^(n) of 
the wireless network V{n) is the set of all achievable caching traffic matrices A^^ G M^"^". 

Example 1. Consider V{n) = {vi}f^i with n = A. Assume that vi requests a message "^{^g^^^j^^j available 
at the caches 1)3, and ^4 at rate 1 bit per channel use, and an independent message m^^g} „^ available only 
at W3 at a rate of 2 bits per channel use. Node V2 requests a message ?7t.{„3_^4} available at the caches 
and f4 at a rate of 4 bits per channel use. The messages ms^i,^^^^}^^-^, m^^^^^^^, and ra^^.^^^^^^^^ are assumed 
to be independent. This traffic pattern can be described by a caching traffic matrix A G M}^^"^ with 
^{v3,v4,},vi_ = ^{vi}-ui_ = 2, ^{V3,V4,},V2 = 4, and Xu,w = otherwise. Note that in this example node vi is 
destination for two (independent) caching messages, and node V3 and V4 serve as caches for more than 
one message (but these messages are again assumed independent). 

To simplify notation, we will assume when necessary that large reals are integers and omit [■] and [-J 
operators. For the same reason, we suppress dependence on n within proofs whenever this dependence is 

^We make the full CSI assumption in all the converse results in this paper. Achievability can be shown to hold under weaker assumptions 
on the availability of CSI. In particular, for a > 3, no CSI is necessary, and for a G (2,3), a 2-bit quantization of the channel state 
{du,v[t]}u,v available at all nodes at time t is sufficient. 

^Throughout log and In represent the logarithms with respect to base 2 and e, respectively. 
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Fig. 2. Construction of tiie tree grapii G. We consider the same nodes as in Figure [T] with L{n) = 2. The leaves of G are the nodes V{n) 
of the wireless network. They are always at level I — L{n) + 1 (i.e., 3 in this example). At level < ^ < L{n) in G, there are 4^ nodes. 
The tree structure is induced by the decomposition of V{n) into subsquares {Vt^i{n)}i,i, delineated by dashed and dotted lines. Level 
contains the root node of G. 



clear from the context. We use bold font to denote matrices whenever the matrix structure is of importance. 
We use the j symbol to denote the complex conjugate of a matrix. 

III. Main Results 

We now present the main results of this paper. In Section IIII-A[ we provide an inner and, for large 
path-loss exponents a > 6, a matching (in the scaling sense) outer bound on the capacity region A^^(r2). 
In Section IIII-Bl we discuss computational aspects. In Section |III-C[ we introduce the communication 
scheme achieving the inner bound on A^^(ra). We analyze several example scenarios in Section UlI-DI 

A. Caching Capacity Region 

Let G = {Vg, Eg) be an undirected capacitated graph, constructed as follows. G is a tree with leaf 
nodes V{n) C Vg- Leaf nodes in G share the same parent node in G if they fall within the same subsquare 
at level L(n) in A(n). Nodes at level £ in the tree G share the same parent node if all the leaf nodes that 
descend from it fall in the same subsquare at level ^ — 1 in A(n). Note that through this construction, 
each set Vi,i{n) for i E {0, . . . ,L(n)}, i G 4^ is represented by exactly one internal node in G. This 
construction is illustrated in Figure [2l Assign to each edge e G Eg at level i in G (i.e., between nodes at 
levels i and £ — 1) a capacity 

^ r(4-^n)2-™°^3,a}/2 if 1 < ^ < 

ife = L{n) + l. 

With slight abuse of notation, we let for (n, v) = e E Eg 

As we shall see in the following, the caching capacity region A^^(n) is closely related to the following 
quantity: 

ACA(^)4|ACAeM^"x": Yl E >^u%< E c^,v^ScVg\. 

^ UcSnV{n) w&V(n)\S {u,v)gEg. ^ 

u€S,vfS 

The region A^A(n) is described by various subsets S C Vg- Each such subset can be understood as a cut 
in the graph G. For every cut S C Vg, the sum-rate 

E E 

ucsnv{n) weVin)\S 

between nodes in S and (i.e., across the cut) is bounded by the sum-capacity 




iu,v)eEG. 
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{vi,V2i D3 




Fig. 3. For a > 6, the set A*^*(n) approximates the caching capacity region A*^*(n) of the wireless network in the sense that 6i(n)A'^*(n) 
(with bi{n) > 71^°'^^) provides an inner bound to A*^*(n) and b2{n)A^'^{n) (with b2{n) < ti°'^^) provides an outer bound to A^'^{n). The 
figure shows two dimensions (namely A'j^^j and \^^_^ of the 2" x n-dimensional sets A'^*(n) and A*^*(n). 



of edges between S and S'^. Note that we only count traffic such that all caches U are contained in 
S. 

The first result states that, for all a > 2, A^^(n) is an approximate inner bound to the caching capacity 
region A'-'^(n). 

Theorem 1. Under either fast or slow fading, for any a > 2, there exists bi{n) > n~°^^^ such that 

with probability 1 — o(l) as n ^ oo. 

We point out that Theorem [T] holds only with probability 1 — o(l) for different reasons in the fast and 
slow fading case. For fast fading, the theorem holds only for node placements that are "regular" enough. 
A random node placement satisfies these regularity conditions with high probability as n — > oo. For slow 
fading. Theorem \T\ holds under the same regularity conditions on the node placement, but moreover only 
holds for almost all realizations of the channel gains. 

The next result states that, for all a > 6, A^^{n) is also an approximate matching outer bound to 
ACA(n). 

Theorem 2. Under either fast or slow fading, for any a > 6, there exists h2{n) < n"^^^ such that 

A^\n) C b2{n)A^\n) 

with probability 1 — o(l) as n oo. 

As Theorem [U Theorem |2] holds only with high probability due to regularity conditions on the node 
placement. However, unlike Theorem [U Theorem |2] holds for all realizations of channel gains also for the 
slow fading case. 

Comparing Theorems [H and [2l we see that, for a > 6, the caching capacity region A^^(n) is approxi- 
mately equal to A^^(n) in the sense that 

^-o(i)^CA^^) C A^^(n) C n°(^)AC^(n). 
In other words, for a > 6, A^^(n) scales as the caching capacity region A{n). This is illustrated in Figure 

m 
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B. Computational Aspects 

Since we are interested in large networks, computational aspects are a concern. Note that the approximate 
caching capacity region A'-'^(n) is described in terms of essentially 9(4") cuts S C Vq- We show in 
Example [3] in Section IIII-DI that a description with significantly fewer cuts is not possible. In other 
words, even an approximate description A^^(n) of the caching capacity region A^^(n) is computationally 
intractable for large values of n. 

On the other hand, consider the simpler problem of testing membership of A'^'^ in A'~'^(n). We now 
argue that this problem can be approximately solved in an efficient manner. More precisely, we show that 
j^CA ^ A^^{n) can be checked approximately in polynomial time in the description complexity of A^^. 
Combined with Theorems [T] and [21 this shows that, for a > 6, approximate membership A*^^ G A^^(n) 
can be checked efficiently as well. 

Formally, define for any caching traffic matrix A^^ G M^"^" 

p^cA(n) ^ sup{p > : pA^^ G A^^{n)}. 

Membership A^^ G A'-'^(n) can then be evaluated by checking if pxCA{n) < 1. Let (p^ckiji) to be the 
solution to the following linear program 

max (j) 

s.t. fp'U,^ ^ '^^u% V t/ C Vin),w G V{n), 



H H 5^ /p < Ce V e G ^G, 

p£P:eep UcV{n) weV{n) 

fp,u,n, >0 y U GV{n),weV{n),pePu,. 



(1) 



where Pu,w is the path in G from node u to node w (since G is a tree, there is only one such path), and 
where 



W 1 



p- V} U P"-- 

UcV{n) w&V(n) 

Note that the linear program O, and hence also (pxc/^in), can be evaluated in polynomial time in the 
description length of A^^ (i.e., in polynomial time in the length of the "input" of the linear program) by 
setting the flow variables fp,u,w to zero whenever A^^ = and p G Pu,w Moreover, using a primal-dual 
algorithm, ([T]) can be solved efficiently in a distributed manner (see, for example, [22, Chapter 3.7]). 
The following theorem shows that (pxc^in) is a good approximation to pxcA{n). 

Theorem 3. Under either fast or slow fading, for any a > 2, there exists 63 > n~°^^^ such that for any 
n and caching traffic matrix A'^'^M^ 

h{.n)pxo^{,n) < 0ACA(n) < pxo^{n). 

As argued above, (pxc^{n) can be computed in polynomial time in the description length of A'-''^. Hence 
Theorem [3] shows that testing membership A^^ G A'^'^(n) can be done approximately in polynomial time in 
the description length of A^^. Combined with Theorems [T] and [2] this implies that, for a > 6, approximate 
achievability of a traffic matrix A^^ (i.e., testing membership A'~'^ G A'^^{n)) can be checked efficiently 
and in a distributed fashion. 
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C. An Efficient Content Delivery Protocol 

Theorem [T] provides an inner bound to the caching capacity region of a wireless network. Here we 
describe the communication scheme achieving the inner bound. The matching outer bound shows that, 
for a > 6, this scheme is optimal in the scaling sense. 

Our proposed communication scheme consists of three layers, similar to a protocol stack. From high 
to low level of abstraction, these layers will be denoted by routing layer, cooperation layer, and physical 
layer. 

From the view of the routing layer, the wireless network consists of the noiseless capacitated tree 
graph G defined in Section ITII-AI (see Figure |2] there). To send a message available at the caches U to its 
destination w, the routing layer routes the message over G. The optimal requests of message parts from 
the caches in U (i.e., optimal cache selection) are found by solving the linear program ([T]). As pointed 
out in Section IIII-B[ this optimal cache selection can be performed efficiently by a distributed algorithm. 

The cooperation layer provides the tree abstraction G to the routing layer. Sending a message up or 
down an edge in the tree G in the routing layer corresponds in the cooperation layer to distributing or 
concentrating the same message in the wireless network. Recall that the leaf nodes of G are the nodes 
V{n) of the wireless network and that each internal node of G represents some subsquare V(,^i{n) of V{n). 
To send a message from a child node to its parent in G (i.e., towards the root node of G), the message at 
the wireless nodes in V{n) represented by the child node in G is distributed (over the wireless channel) 
evenly among all nodes in V{n) represented by the parent node in G. This distribution is performed by 
splitting the message at each node in V{n) represented by the child note in G into equal size parts, and 
transmitting one part to each node in V{n) represented by the parent node in G. To send a message from 
a parent node to a child node in G (i.e., away from the root node of G), the message at the wireless nodes 
in V{n) represented by the parent node in G is concentrated on the wireless nodes in V{n) represented 
by the child node in G. This concentration is performed be collecting at each node in V{n) corresponding 
to the child node in G the message parts of the previously split up message located at the nodes in V{n) 
corresponding to the parent node in G. 

Finally, the physical layer performs this concentration or distribution of messages. Note that the kind 
of traffic resulting from the operation of the cooperation layer is regular in the sense that within each 
subsquare all nodes receive data at the same rate. Uniform traffic of this sort is well understood. Depending 
on the path-loss exponent a, we use either hierarchical cooperation [17], [18] (for a E (2, 3]) or multi-hop 
communication (for « > 3). It is this operation of each edge in the physical layer that determines the 
edge capacity of the graph G as seen from the routing layer. 

The next example illustrates the operation of this three-layer scheme. For more details on this archi- 
tecture (in particular the cooperation and physical layers), we refer the reader to [19]. 

Example 2. Consider the three layers of the proposed communication architecture depicted in Figure HI 
From top to bottom in the figure, these are the routing layer, the cooperation layer, and the physical layer. 
In this example, we consider a single (U, w) pair. The set of caches U consists of a single node {u} in 
the wireless network shown at the bottom left, and its destination w is in the top right of the network. At 
the routing layer, the optimal choice of caches is in this case trivial (since there is just one cache u). The 
optimal route between u and w chosen at the routing layer is indicated in black dashed lines. Consider 
now the second edge along the path in G from u to w. The middle plane in the figure shows the induced 
behavior from using this edge in the cooperation layer. The bottom plane in the figure shows (part of) 
the corresponding actions induced in the physical layer. 

D. Example Scenarios 

Here we provide three examples illustrating various aspects of the caching capacity region. Example |3] 
shows that the capacity region for caching is inherently more complicated than the ones resulting from 
unicast or multicast traffic. Example |4] shows that the strategy of always selecting the nearest cache can 



Fig. 4. Example operation of the three layer architecture. 



be arbitrarily bad. Example [5] analyzes the impact of complete caches on the performance of the wireless 
network. 

Example 3. (Insufficiency of edge cuts) 

For unicast traffic and multicast traffic, it is shown in [19] that it is sufficient to consider edge cuts in 
G, i.e, cuts that result if we remove a single edge from G. By construction, G has at most 2n edges, and 
hence there are at most 2n such edge cuts. This contrasts with the situation for caching traffic, for which 
Theorems [H and [2] indicate that we have to consider general cuts, i.e., arbitrary subsets S of Vg- Indeed, 
the approximate capacity region A^^(n) is expressed in terms of essentially 6(4") cuts. Comparing these 
two results, one might suspect that a simpler characterization in terms of edge cuts can be found for the 
caching capacity region as well. This example shows that this is not possible. In other words, the caching 
capacity region is inherently more complicated than the unicast or multicast capacity regions of a wireless 
network. 

Assume V2^i{n) and V2,2('^) are subsets of Vi^i{n), and consider two nodes ui E V2,i(n), U2 E V2,2('^)- 
Construct 

^CA A ) pin) ifU = {ui,U2},w G V"i,2(n), 
-^^'^ " \0 else, 

for some p{n) > 0. This is illustrated in Figure [5l 




Fig. 5. Caching traffic pattern for Example [3] 



The best edge cut results from removing edge e in Figure [51 The cut capacity is = {juY ni™{3,a}/2 
and the sum-rate across the cut is \Vi^2{n)\p{n). By Theorem |2] and for a > Q, this shows that p*{n), the 
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largest achievable value of p{n), is upper bounded as 
with high probability. 

On the other hand, consider the general node cut S = {^1,1x2} C Vq. The cut capacity here is 2 and 
the sum-rate across the cut is again |Vl 2(n)|p(n). Moreover, it is easily checked that S is the bottle neck 
cut in G. Thus, for a > 2, Theorem [T] shows that p*{n) is lower bounded with high probability as 

P* (n) > I V^i,2 (n) r^n-°(^) = , (2) 

and, for « > 6, Theorem [21 shows that 

p*{n) <n-i+"«. 

In this example, it can be shown that the correct scaling of p*{n) is actually 

p*{n) = 

for all a > 2 (not just a > 6 as suggested by Theorem |2l). Note that this differs substantially from the 
upper bound ^ obtained from the best edge cut. 

Example 4. (Nearest-neighbor cache selection) 

A reasonable strategy of selecting caches is to request the entire message from the nearest available 
cache. In fact, this is the strategy implicitly assumed in most of the prior work considering caching in 
wireless networks cited in Section II-AI This example shows that this strategy can be arbitrarily bad. 

Assume V2,i(n) and V2,2('^) are subsets of Vi^i{n), and V2,3(n) is a subset of Vi,2('^)- Consider a node 
u* e V2,2in), and label the nodes in V2,i(n) = {wi,W2, ■ ■ ■} and in ^2,3(72) = {ui,U2, ■ ■ ■}■ Construct 




Fig. 6. Caching traffic pattern for Example |4] 

For every Wi, the nearest cache is u* . It can be shown that requesting the entire message from this 
nearest cache results in a per-node rate of at most 

pin) < n-'+°(') 

for all a > 2. 

Assume now each Wi uses only the more distant cache Ui. This achieves a value of p{n) of 

p{n) > n^~^'"^{3M/2-o{i) ^ 

Applying Theorem [T] yields the same 72i~™"H3,a}/2-o(i) y^iyg Qf p^n), and Theorem |2] confirms that, for 
a > 6, no scheme can achieve a better scaling. Hence 

n*(n\ = r)l-m"i{3,a}/2±o(l) 
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for a > 6, and, as in the previous example, it can be shown that this is the correct scaling of p*{n) also 
for a E (2,6]. This shows that the strategy of always selecting the nearest cache can result in a scaling 
exponent that is considerably worse than what is achievable with optimal cache selection. 

Example 5. (Complete caches) 

Assume we randomly pick n'^ caches for P E [0, 1), each holding a complete copy of all the messages. 
More precisely, letting W = {wi}^!l^ be the collection of caches, we consider a caching traffic matrix 

^^'^ " \0 else, 

for some p{n) > 0. In this setup, choosing the nearest cache strategy (as discussed in Example S]) results 
in a per-node rate of 

with probability 1 — o(l) as n ^ oo. The three-layer architecture proposed in Theorem [T] achieves the 
same rate, and Theorem [21 shows that, for a > 6, for any communication scheme 

Hence, for a > 6, 

and it can be shown, as in the previous two examples, that this is the correct scaling of p*{n) also for 
a E (2,6]. 

This example illustrates that in situations in which the traffic demand and location of caches are regular 
enough, the strategy of selecting the nearest cache (as analyzed also in Example IH and which is shown 
there to be arbitrarily bad in general) can actually be close to optimal. 

IV. Proofs 

This section contains the proofs of Theorems [H [2l and [3l We start in Section IIV-AI with some auxiliary 
results. Sections IIV-B[ IIV-CI and IIV-DI contain the proofs of Theorems [3l [H and [2l respectively. 

A. Auxiliary Results 

In this section, we define several quantities and recall some auxiliary results needed in several of the 
proofs. 

We first introduce a "dual" description of the various regions. Recall that for any caching traffic matrix 

pMn) = sup {p > : pA^^ E A^^(n)}, 

and define similarly 

pMn) = sup {p > : pA^^ E A^^(n)}. 

Consider a caching traffic matrix A^^ E M^"^*^ for the wireless network and note that A^^ can equivalently 
be treated as a traffic matrix between the leaf nodes of the graph G introduced in Section |llll Let 
A^^{n) C M^"^" be the collection of such caching traffic matrices A^^ E M^"^" that can be routed over 
G. Note that (j)xc^{n) as defined through the linear program ([T]) is equal to 

0ACA(r2) = sup {0 > : 0A^^ E A%^{n)]. 

It can be shown that the regions A^^(n), PS''^{n), and A^'^(n) are convex, and hence knowledge of pxc^{n), 
PxCA^n), and 0;^cA(n) for every A^^ E M^"^" is sufficient to completely describe them. 
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To analyze caching traffic in wireless networks, we shall make use of known results for unicast traffic 
in such networks. Formally, a unicast traffic matrix A'-'^ for V{n) is an element of R^^'^, associating with 
each pair {u,w) e V{n) x V{n) the rate A^^ at which node u wants to transmit a message to node w. 
We define the unicast capacity region A"^^(n) C M"^" to be the collection all achievable unicast traffic 
matrices A^*-" E M"^". In analogy to the caching case, define A^^{n) C M"^" as the collection of unicast 
traffic matrices A^^ G M!^''" that can be routed over the tree graph G. 

We now introduce some regularity conditions that are satisfied with high probability by a random 
node placement. Define V{n) to be the collection of all node placements V{n) that satisfy the following 
conditions: 

ru,v > for all u, v E V{n), 

\Ve,i{n)\ < log(n) for i = ^ log(n) and all i E {1,.. . ,4^}, 

1 / 72 \ 

\ViAn)\ >1 for£=-log — ^ andalHG{l,...,4^}, 

\Ve,,in)\ E [4-^-V4-^+^n] for alR G |l, . . . , ^ log(n) (l - log-^/^H) }, z G {1, . . . , 4^}. 

The first condition is that the minimum distance between node pairs is not too small. The second condition 
is that all squares of area 1 contain at most \og{n) nodes. The third condition is that all squares of area 
2 log(n) contain at least one node. The fourth condition is that all squares up to level |log(n)(l — 
log~^^^(?7,)) contain a number of nodes proportional to their area. 

The next lemma states that a random node placement satisfies these conditions with high probability. 

Lemma 4. 

F{V{n) E V{n)) > 1 - o(l) 

as n ^ oo. 

Proof. See [19, Lemma 5]. □ 

B. Proof of Theorem \3\ 

We first prove the upper bound, i.e., 

0ACA < PaCA- (3) 

Note that if A'^'^ G A^^ then there exists a strategy to route traffic at rates A'~'^ over G. This implies that 
the flow across each cut S C Vg must be less than the capacity of that cut. The flow across such a cut 
S contains at least all those requested messages that only contain caches in S, i.e., 

E E ^5 

UcSr\V{n) w£V{n)\S 

On the other hand, the capacity of the cut S is equal to 



,CA 



Therefore 

E E >'i%< E 



UCSnV{n) w£V{n)\S {u,v)eEG: 

u<=S,vfS 



for aU S CV, and hence A^^ G A^''^ implies A^^ G A^'^. Thus, A^^ C A^'^, from which ^ follows 
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G 



G 




Fig. 7. Construction of the directed graph G from the undirected graph G. 



We now prove the lower bound, i.e., we show that there exists 63(71) > n "'•^^ such that for any 

0ACA > h{n)p^cA. (4) 

Pick any A^^. Since for any 6 > 0, 

1. 

PbACA = -^PaCA, 



we may assume without loss of generality that 

{U,w) 



(5) 



Recall that G is an undirected capacitated graph. Construct a directed capacitated graph G = (Vg, -Eg) 
as follows. Take the undirected graph G and turn it into a directed graph by splitting each edge e G Eq 
into two directed edges each with the same capacity as e. Add 2" additional nodes to Vq, one for each 
subset U d V . Connect the new node u corresponding to f/ C V" to each node n G f/ by a (directed) 
edge (n, u) with Cu^u = 00. This procedure is illustrated in Figure U\ We call the directed version of G 
that is contained in G as a subgraph its core. Note that if some flows can be routed through G then the 
same flows can be routed through the core of G, and if some flows can routed through the core of G then 
at least half of each flow can be routed through G. Hence, for scaling purposes, the two are equivalent. 

Now, assume we are given a caching traffic matrix A^^ for G. Construct a unicast traffic matrix A^^ 
for G by making for each (U, w) pair in G (i.e., U (1V,weV) the node tt in G corresponding to t/ a 
source for w with rate 



TUC A \CA 



Denote by the set of feasible unicast traffic matrices for G, and set 

03^uc = sup{0 > : f\^^ e Af}. 
By construction of G from G, and by the above argument relating G to the core of G, we have 

0ACA > ^0AUC- 



(6) 



We are thus left with the problem of analyzing unicast traffic over G. Two difficulties arise. First, G 
is a directed graph. While unicast traffic over undirected graphs with m nodes are well understood and 
0(log(m)) approximation results for the capacity region of such graphs in terms of cut-set bounds are 
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known [23], the best known approximation result for general directed graphs is (up to polylog factors) 
0(m^^/^^) [24]. Second, the graph G is exponentially big in n. More precisely, \V^\ > 2". Hence even a 
logarithmic (in the size m of the graph) approximation result will only yield a polynomial approximation 
in n. Nonetheless, as we shall see, the special structure of G can be used to obtain log(n) approximation 
resuks of A^^. 

We use an idea from [25], namely that the unicast traffic problem can be reduced to a maximum 
sum-rate problem. More precisely, for a subset F C x of (m, w) pairs in G, define the maximum 
sum-rate as 

We now argue that for every unicast traffic matrix A'-'*-' there exists F such that cr^ is not too much bigger 
than (p'^uc- 

First, note that 0^uc is the solution to the following linear program 



maximize (p 

subject to EpGP^.„ fp > f>^u% y u,we Vq, 



(7) 



where Pu,w is the collection of all paths in G from node u to node w, and 

P= I I P 

^ II ^ u,w 

The corresponding dual linear program is 

minimize Zlees- Cerrie 

subject to J2eep^e > du,w 'i u,w eV^ 

Eu,u„v,du,J'::Z >1 (8) 



rUe >0 V e G Eg, 



du,w > W u,w e Vq. 

Since the all-zero solution is feasible for the primal program (|7]), strong duality holds. 
Second, ap is the solution to the linear program 

maximize E(„,^)6f T^p^K,^ fp 

subject to Y.p&P:e&pfp < Ce Mc^E^, 

fp >0 ypeP, 



and its dual is 



minimize EeeSg 

subject to Eeep"^e >du,w W U,W eVg^p e Pu,w, 

du.., >1 y(u,w)eF, (9) 



me > V e e 

C^u 10 > W u,w E V, 



G- 



Again strong duality holds. 

Let {m*}eg£;-., {d^^}u,wGV^ be a minimizer for the dual ([8]) of the unicast traffic problem. We now 
show how {m*j, {d^^^} can be used to construct a solution to the dual ^ of the maximum sum-rate 
problem. Note that we can assume without loss of optimality that 

WiVp„,»EeGp"^e else. 
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Now, since Cg = oo whenever e G \ Eg, we have m* = for those edges. Since, in addition, Aj;'^ > 
only if M e \ Vg and if w is a leaf node of G, this implies that {dl ,i^}u,wev^ can take at most 
different nonzero values. Order these values in decreasing order 

dl > dl > . . . > d*j^ > d^^^ = 

with K < n^, and define 

\\JC A \^ \UC 

We now argue that dl < for all G {1, . . . , K}. In fact, assume d\ > n^, then by (flOl) there exists 
at least one edge e such that ml > n. Hence 

Cgm* > Cgml > n 

since Cg > 1 for all e G i^g. On the other hand, let mg = 1 for all edges between the leave nodes and 
parent nodes in the core of G, and let mg = for all other edges. Set du^w as in (flOl ) but with respect 
to this choice of {mg}. Since all paths between node pairs {u,w) such that Aj;'^ > include at least 
one edge between the aforementioned leave and parent nodes, we have du,w > 1 whenever A^*^ > 0, and 
therefore 

by the normalization assumption ([5]). Thus {mg}, {(/„ „} is feasible for the dual ([8]), and has value 

Cgmg = n < ^ Ceml, 

contradicting the optimality of {m*}, {d^,^,}. Hence dl < d\ < for all k. 

We now argue that at least one rf^ is not too small. Let ki < k2 < . . . < ki he such that 

WLi = {fc:Af >^}. (11) 

Note that / > 1 since otherwise 

K+l ^ 
MjUiGVg fc=l 

^ + 1 
< 1, 

contradicting the normalization assumption ([5]). Finally, define 

i 
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Using that {dl} is feasible for the dual ([8]), that dl < n^, and that K < ri^, we have 



y4,A^^>i- y dix 



uc 

k 







> 1 - 








1 




> -. 




- 2 





(12) 



We argue that this implies existence of i such that 



"^^^ - 2s,(l + ln(2n4))- ^^^^ 



Indeed, assume (|T3I) is false for all i. Then 

2(mn(2n4))^ a 



^ 1 ^ xuc 

j=l 



2(l + ln(2n4)) V ^ 

.(l + ln(.,/A,uf))) 



2(1 + ln(2n4)) 

< ^ r-7 — (l + ln(2r2^)) (14c) 

- 2(1 + ln(2n4)) V ^ V v ; 

_ 1 

~ 2' 

where we have used that / > 1 in (I14al) . that 1 — x < — ln(x) for every a; > in (I14bl) . and that s/ < 1 
by ^ and A^^^ > 2^ in (I14cl) . This contradicts (fT2)) . showing that (fT3l) must hold for some i. Consider 
this value of i in the following. ^ 

Now, consider the following set F of (n, w) pairs: 

F4{(«,^):rf;^>4j. 

Note that, by (flOl) . F contains only pairs {u,w) such that m G \ Vg and w G ^ C (i.e., nodes in 
G corresponding to leaf nodes in G). Set 

d* 

dl 

A ml 
rrie = ——. 
d* 

Note that for (u, w) G F, 

d* 
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and that for all u,w E V^, p E Pu,w 



> —d* 

- dl "'"^ 



u,wi 



by feasibility of {d* and {m*} for the dual Hence, for this F, the choice of {me} and {du,w} is 
feasible for the dual By weak duality 



G 



e" "e 



1 V- * 

— > Cem.. 



By ([T3D, 

and, since dl, > dl, for all j < i. 



dl > 



2si{l + \n{2n^))' 



uc 

u,w 



Therefore 



= E E ^ 

j=l (M,u.):d* =dj.* 
(u,ii;):d*_,„>d*,^ 

A \UC 



Since {m*} is optimal for the dual ([8]), and by strong duality, we also have 
and hence 



'^^"^ - 2(l + ln(2n4))^- ^^^^ 

We are thus left with analyzing maximum sum-rates ap in G. Now notice that, since the edges in 
Ejj \ Eg have infinite capacity, and since for (n, u;) G F we have n G Vg \ Vg and G V C Vg C Vg, 
this analysis^ can be done by considering only the core of G. More precisely, for a collection of node 
pairs F in G as above, we construct a collection of node pairs F in G as follows. For each {u,w) E F 
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with u connected by G with nodes U C Vq C Vg, add (n, w) to F for each u E U. Denote by ap the 
maximum sum-rate for F in G. Since G is the undirected version of the core of G, we have 

ap > ap. (16) 

For a collection of node pairs F in G, we call a set of edges M a multicut for F if in the graph 
(Vg, -Eg \ ^) each pair in F is disconnected. For a subset M C Eg, define 

Cm = ^ Ce. 

It is shown in [26, Theorem 8] that if G is an undirected tree, then for every F eVqxVg there exists a 
multicut Af for F such that ^ 

o-F > ^Cm- (17) 

Combining (fTSi) . (fT6l) . and (flTl) . we obtain that for every A^^ there exists a collection of node pairs F in 
G, and a multicut M for the corresponding F in G such that 

'^^'^ - 4(l + l!i(2n4))I^- ^^^^ 

We now show how the edge cut M C -Eg can be transformed into a node cut S cVg- Denote by {Si} 
the connected components of {Vg, Eg \ M). We can assume without loss of generality that 



M = \J{S, X St) n Eg, 



since otherwise we can remove the additional edges from M to create a smaller multicut for F. We 
therefore have 

Cm = i^'^C{S'ry.Si)nEG^ (19) 
j 

since every edge in M appears exactly twice in the sum on the right-hand side. Define for S* C Vg 

^C^ A \^ \^ xCA 

ucsnv ■wev\s 

M is a multicut for F induced by F, and hence for every (m, w) E F and the corresponding pair (U, w), 
M separates w from all the nodes in U. Therefore, for each such (U, w) pair, there exists a Si such that 
w E Si, U C S^. This shows that 



Equations (|T8|) . (1191) , and (I20|) imply that there exists j such that 

7 ^ 1 'E^iC{S^xS^)nEG 

^^^^^-8(1 + 1^2^4)) E.Aif,,^ 
1 C(5Cx5^,)n£;G 



> 



8(l + ln(2n4)) AgA 



. 1 . C(5x5=)n£G 
> ^ ; — 7 7TT mm ttj: 

- 8{l+\n{2n^)) scVg A§a^ 
1 

Paca. 



^'^^ (21) 



]{1 + \n{2n^)) 
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Combined with this shows that for 

Un) ^ — ] , > n-°(^) 

' 16(l + ln(2n4)) - 

we have 

0ACA > h^{n)py^CK, 

proving the lower bound in Theorem [3l 
C. Proof of Theorem [7] 

In this Section, we provide the proof of Theorem [TJ Instead of proving the theorem directly, it will be 
convenient to work with the dual descriptions pxCK{n) and pyckin) of A^^(ri) and A'-'^(n) introduced in 
Section IIV-A[ The next theorem is the dual version of Theorem [T] 

Theorem 5. Under either fast or slow fading, for any a > 2, there exists hi = n~°^^^ such that with 
probability 1 — o(l) as n ^ oo for any n and caching traffic matrix X^'^ E M^"^" 

bi{n)pxcA{n) < PACA(n). 

Proof. The same arguments as in [19, Theorem 1] show that there exists b{n) > n~°^^^ such that if a 
caching traffic matrix A^^ can be routed over G, then b{n)X^'^ can be communicated reliably over the 
wireless network. Formally, if e V then under fast fading 

b{n) (p^cA < Paca, (22) 

and the same results holds for slow fading for a collection of channel gains H (not dependent on A'"'^) 
with 

en)> i-o(l) 

as n — > oo. 

Combining (|22l) . with Theorem [3] and Lemma IH we obtain that with probability 

n{K,.}en,V eV)>l-o{l) 

as n — > oo, we have for any caching traffic matrix A^^ 

Paca > 6(n)0ACA 

> b{n)b^{n)pxcA. 

Setting 

bi{n) = b{n)b'i{n), 

and recalling that 63 (ra) > n~°'^^'> and b{n) > n~°^^^ both uniformly in A^^, concludes the proof of Theorem 
S □ 
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D. Proof of Theorem H] 

In this Section, we prove Theorem |2l As before, it will be convenient to work with the dual description 
PxCA{n) and pxCA{n) of A'-'^(n) and A^^(n) as introduced in Section |IV-A[ The next theorem is the dual 
version of Theorem [2] 

Theorem 6. Under either fast or slow fading, for any a > 2, there exists 62 < such that with 
probability 1 — o{l) as n ^ 00 for any n and caching traffic matrix A^^ G 

P\OA{n) < b2{n)pxCA{n). 

We start with some auxiliary lemmas. For a subsets 51,52 C V{n), denote by C{Si,S2) the MIMO 
capacity between the nodes in Si and S2. Denote by S2 the nodes in ^2 that are at distance between k 
and A; + 1 from Si, i.e., 

^2 = {v G 52 : minr„,„ e[k,k + 1)}. 

nGSi 

Lemma 7. Under either fast or slow fading, for every a > Q, there exists a constant Ki such that for all 
V{n) G V{n) and all 5 C V{n) 

log{n) 

C(5, 5^) < i^i logV) El^'l- 

fc=0 

Proof Set 5i = 5 and 52 = S\ and note that 

00 

52 = U Si 

k=0 

Let 

be the matrix of channel gains between the nodes in Si and 52. Under fast fading 

C(5i,52) = max logdet (l + hJ^^5^Q(H)Hs„5.) Y 

Q{H)>0: y ' y 

and under slow fading 

C(5i, 52) ^ max logdet (l + H^^^^^QH^^s.) • 

<P VuGSi 

Applying the generalized Hadamard inequality, we obtain that under either fast or slow fading 

C(5i, 52) < C(5i, u'^S^^Sl) + C{Si, Ufc>iog(„)52^). (23) 
Now, for the first term in (|23l) . using Hadamard's inequality once more, yields 

log(n) 
log(n) 

By Lemma 7 in [19], 

C{{vr,{v})<K login) 
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for some constant K, and thus 

log(n) 

C{S,, US")^2^) < log(n) J2 1^2 I- (24) 

fc=0 

For the second term in ( |23] ). we have the following upper bound from (slightly adapting) Theorem 2.1 
in [11]: 

fc>log(n) t,g5| «e5i 

For V E 5*2, the (open) disk of radius k around v does not contain any node in 5*1 (by definition of S2). 
Moreover, since V E V, there are at most log(n) nodes inside every subsquare of A of sidelength one. 
Thus 

00 

< Klog(n)A;2-"/2^ 
for some constant K independent of Si and k. Therefore, 

CiSuUk>io,in)Sl^) < Yl log'' {n)k'-". (25) 

fc>log(n) 

Consider now some v E S2 with k > log(n), and let u* be the closest point in 5*1 to v. Since v E S2, 
we must have 

ru*,v e [k,k + 1). 

Consider the (open) disk of radius r^* ^, around v and the disk of radius log(n) around u*. Since u* is 
the closest node to v in ^i, all nodes in the disk around v are in S2. Moreover, the intersection of the 



two disks has an area of at least ^ log {n). Since V eV, this implies that this intersection must contain 



at least one point, say v, and by construction 

log(n) 

V E U Sl 

k=0 



This shows that for every node v in 5*1 there exists a node v in U^°^^"''S'| such that 

E[k - log{n), k + 1). 
Now, since V eV, for every node v, there are at most 

2TT{k + l)(log(n) + 5) log(n) < K'klog'in) 
nodes at distance [k — log(n), k + 1). Hence the number of nodes in S"! is at most 

log(n) 

I < ^''^ log^ (n) ^ I ^1 1 • (26) 

k=0 
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Combining with ([25]) yields 

C(5i,U,>iog(„)52^)< Yl \Sl\KHog'{n)k'-- 



k>log{n) 

log(n) 

<K'KHog\n)[j2\Sh) E (27) 

fc=0 fc>log(n) 

log(n) 

= K"iogV) Ei^^i, 

fc=0 

for some constant K", and where we have used that a > Q. Finally, plugging (|24l) and dTTI) into (1231) 
shows that 

log(n) 

52) < (i^ + if") log^(n) ^ l^^^l, 

fc=0 

which proves the lemma with 

Ki = K + K". □ 

The next lemma shows that, for large path-loss exponents (« > 6), every cut is approximately achievable, 
i.e., for every cut there exists an achievable unicast traffic matrix that has a sum-rate across the cut that 
is not much smaller than the cut capacity. 

Lemma 8. Under fast fading, for every a > 6, there exists h^in) < n°^^^ and A"^^ G A'-'^(?t,) such that 
for any n, V{n) G V{n), and S C V{n), 

C{S,S^)<b,{n)Y,Y.^u%- (28) 

mGS w^S 

Moreover, there exists a collection of channel gains Tiiji) such that 

^{{huAu,v&v{n) e nin)) > 1 - o(l) 
as n ^ oo, and such that for {/in.iiju.ii G 'H{n), (l28l) holds for slow fading as well. 
Proof By Lemma |7l for V G V 

C{S, 5^) < K, \og\n)\{v G : rs,, < log(n) + 1}|, (29) 

where 

rs^v = minr„,^. 
Construct a unicast traffic matrix A'-'^ G M"^" as 

,UC A j pin) if r„,«, < log(n) + 1, 



A 



else, 



for some function p{n). We now argue that for p{n) = 6(log~^(n)) there exists b{n) > n~°^^^ such that 
6(n)A'-"~' G A'^'-'. This follows from [19, Theorem 1] (see also Section IX.C there), once we show that for 
every £ G {1, . . . ,L(n)} and i E {1,. . . ,4^} we have 



E E « s (4- 



^^\2-mm{3,o}/2 



E E < (4-^n)— ^^->/^ 
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and for all w E V 

Since we assume that ^ e V, we have for all w G ^ 

J2x)i^^<Klog\n)p{n), 

for some constant K. By the locality of the traffic matrix A'-'^, it can be shown that this is sufficient for 
[19, Theorem 1] to apply with p{n) = log~^(?T,). Hence b{n)X^'^ E A^^ for fast fading, and the same 
holds for slow fading for some H with 

r{{K,,}u,vGV en)> 1 -o(i) 

as n ^ oo. 

Combined with (|29|) . this implies that 

proving the lemma. □ 

We are now ready for the proof of the outer bound on A^^(n). 

Proof of Theorem ^ Consider a cut S* C V in the wireless network. Assume we allow the nodes on each 
side of the cut to cooperate without any restriction — this can clearly only increase pxCA. The total amount 
of traffic that needs to be transmitted across the cut is then 

UcSw^S 

The maximum achievable sum-rate (with the aforementioned node cooperation) is given by C{S, S'^), the 
MIMO capacity between the nodes in S and in S'^. Therefore 

. C{S, S^) 

PxCA < mm— — (30) 

^'^^ z2ucsz2w^s ^U,w 

We proceed by relating the cut S in the wireless network to a cut S in G. By Lemma [8l for \/ e V, 
there exists A'-'^ G A'-"-' such that for fast fading 

CiS,S^)<b,{n)Y,Y.^n%, (31) 

u&S w^S 

and (|3T| ) holds also for slow fading if G (with Ti defined as in Lemma [8]). By [19, Theorem 

1] (see again the discussion in Section IX.C there), for a > 5 and V E V, there exists K such that if 
A'-'^ G A'-'^ then A"log3'^(n)A^^ G h}^, where G is the tree graph defined in Section lIlLAl 

Now, consider any S <zVg such that S HV = S. Note that S is a cut in G separating S from V \ S*. 
Since i^log"^(n)A^^ G A^^, we thus have 

5^5^iflog-^(n)Al^,c^< ^ ^^^^^ 

u&S w^S iu,v)eEc;- 
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and by minimizing over the choice of S such that S nV = S, we. obtain 

ues wis S:scw-s 

u&S,viS 

Combining (|3T1) and (l32l) shows that 



C(5,5^)<^log6H min V c„ 

(u,v)dEG'- 



Together with (l30l) . and using Lemmas |4] and [8l this yields that with probability 

P({/i«,.W-e7^,VeV)>l-o(l) 

as n oo, we have for any caching traffic matrix A^^ 

^ . C{S,S^) 
Pxc/K < mm — — ^ 

z2ucS z2w<iS '^U,w 

<02(njmm_ mm ^ 

sc-v seVG:Snv=s 22ucsnv z2wev\s ^u,w 

= h2{n) mm — — ^ 

ScVg l^UdSnV 2^wdV\S '^U,w 

with 

K 



V. Conclusions 

We analyzed the influence of caching on the performance of wireless networks. Our approach is 
information theoretic, yielding an inner bound on the caching capacity region for all values a > 2 
of path-loss exponent, and a matching (in the scaling sense) outer bound for a > 6. Thus, in the high 
path-loss regime a > 6, this provides a scaling characterization of the complete caching capacity region. 
Even though this region is 2" x n-dimensional (i.e., exponential in the number of nodes n in the wireless 
network), we present an algorithm that checks approximate feasibility of a particular caching traffic matrix 
efficiently (in polynomial time in the description length of the caching traffic matrix). Achievability is 
proved using a three-layer communication architecture achieving the entire caching capacity region in the 
scaling sense for a > 6. The three layers deal with optimal selection of caches, choice of amount of 
necessary cooperation, noise and interference, respectively. The matching (in the scaling sense) converse 
proves that addressing these questions separately is without loss of order-optimality in the high path-loss 
regime. 
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